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1. Introduction 


Hamming or shortened Hamming codes are widely used for error detection in 
data communications. For example, the CCITT (International Telegraph and 
Telephone Consultative Committee) recommendation X.25 for packet-switched 
data networks adopts a distance-4 cyclic Hamming code with 16 parity-check 
bits. for error detection [1]. The code is generated either by the polynomial, 

g 1 (X) = (X+1)(X 15 +X 14 +X 13 +X 12 +X 4 +X 3 +X 2 +X+1) 

= X 16 +X 12 +X 5 +l , (1) 

or by the polynomial 

g 2 (X) = (X+1)(X 15 +X 14 +1) 

= X 16 +X 14 +X+l , (2) 

where X* 3 +X* 4 +X 13 +X* 2 +X 4 +X 3 +X 2 +X+l and X 13 +X 14 +l are primitive polynomials of 

15 

degree 15. The natural length of this code is n = 2 -1 = 32,767. In practice 
the length of a data packet is no more than a few thousand bits which is much 
shorter than the natural length of the code. Consequently, a shortened version 
of the code is used. Often the length of a data packet varies, say from a few 
hundred bits to a few thousand bits, hence the code must be shortened by various 
degrees. Shortening affects the performance of the code. This is the subject 
of investigation in this paper. 

For a random-error channel with bit error rate (or transition probability) 
e, it was proved by Korzhik [2] that there exist (n,k) linear codes with prob- 
ability P g of an undetected error satisfying the following upper bound: 

p e 1 2“ ( n_k )[i _ (l- e ) k ] 

for all n, k and e with 0<e<l/2. Korzhik 's proof is an existence proof, and 
no general method has been found for constructing codes satisfying the bound 
given by (3). Only a few classes of known codes [3-6] have been proved to 
satisfy a weaker bound. 
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P e < 2 -(n-k) . (4) 

A code is said to be good for error detection if it satisfies the above bound, 
because the probability of an undetected error for the code is no greater than 
2~( n "k) even f or t h e W orst channel condition with e = 1/2. In fact for small 
e, the error probability P g is much smaller than Strict-sense Hamming 

codes, distance-4 Hamming codes, double-error-correcting and some triple-error- 
correcting primitive BCH codes of natural length are known to satisfy the bound 
given by (4) and their error probability P g decreases monotonically as e decreases 
[3-6]. Hence these codes are good error-detecting codes. Using a good error- 
detecting code with a moderate number of parity-check bits (say n-k = 16-32) in 
an automatic-repeat-request (ARQ) system, the probability of an undetected error 
can be made very small and virtually error-free data transmission can be achieved. 

Even though a Hamming code of natural length satisfies the error probability 
bound, P g _< 2”( n ”^ , given by (4), a shortened Hamming code does not necessarily 
obey the bound [3]. Whether a shortened Hanming code satisfies the bound 
depends on the degree of shortening. Because Hamming codes are normally used in 
shortened forms, it is important to know whether a specific shortened Hamming 
code satisfies the bound In this paper we investigate the probability 

of an undetected error for shortened Hamming codes, particularly the shortened 
Hamming codes generated by the polynomials given by (1) or (2). A method for 
computing the probability of an undetected error is presented. We show that the 
codes generated by the polynomial given by (1) yield better performance than the 
corresponding codes generated by the polynomial given by (2). 

2. Evaluation of Undetected Error Probability of Shortened Cyclic Hamming Codes 

Consider a binary (n,k) linear code C. Let P(C,e) denote the probability 
of an undetected error when code C is used for error detection on a binary 
symmetric channel with transition probability e. Let A^ and B^ be the number 
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of codewords of weight i in C and its dual C -1 respectively. Then P(C,c) can 
be expressed in the following two forms, one is in terms of A., and the other 
is in terms of B.. [7,8,9] : 

P(C,e) = l A.c i (l-e) n " i (5) 

i=l 1 

= 2^ n " k ^ l B-U^e) 1 - (l-e) n . (6) 

i=0 1 

From (5) and (6), we see that, to compute the exact error probability of a 

linear code, one needs to know either the weight distribution {A..:CKi<n} of 

the code or the weight distribution {B. :0<^<n} of its dual. Theoretically, 

we can compute the weight distribution of an (n,k) linear code by examining its 
k n -k 

2 codewords or by examing the 2 codewords of its dual. However, for large 
n, k and n-k, the computation becomes practically impossible. Except for some 
short linear codes and a few classes of linear codes [8-11], the weight distri- 
butions for most linear codes are still unknown. Consequently, it is very dif- 
ficult, if not impossible, to compute the probability of an undetected error 
for a great many codes. 

For Hamming codes, a simple formula for enumerating A., or B^ is known [8-11], 
but no general formula is known for shortened Hamming codes. In general, for 
shortened Hamming codes, n-kfk. Hence it requires less effort in computing 
{{L : 0^i£n) than in computing (A^ : 0_< i_< n } . The weight distribution of the dual 

of a shortened Hamming code can be computed by generating all linear combinations 
of a parity-check matrix for moderate values of n-k. We call this the direct 
method. In the following, more effective methods for computing the weight dis- 
tributions of the dual codes of shortened Harming codes are presented. The 
computation is feasible for moderate values of n-k. 
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For any positive integer m>3, there exists a cyclic Hamming code of length 
2 m -l and minimum distance 3. The generator polynomial of the code is a primi- 
tive polynomial p (X ) of degree m. Let 

m . 

P(X) = l Pi X J (7) 

j=0 J 

where PQ = P m = l- Thus the code is a (2 m -l, 2 m -m-l) code with m parity-check 
symbols. Let denote this Hamming code. The dual code of Cgm^. denoted 

1S a maximum- length- sequence code [8-11] which consists of the all -zero 
codeword and 2 m -l maximum-length-sequences. Each maximum-length-sequence has 
weight 2 m ’* and cyclicly shifting any maximum-length-sequence generates all 
the other maximum-length-sequences. 

A distance-4 Hamming code of length 2 m -l is simply the even weight subcode 
of C 2 m_i* It is generated by the polynomial g(X) = (X+l)p(X) [9,11]. We denote 
this code by e * The dual code of C 2m _j Q is the first-order cyclic 

Reed-Muller code of length 2 m -l which also has minimum weight 2 m ~* [9,11]. 

For any positive integer n with m<n<2 m , let C n be a shortened (n,n-m) code 
of C 2 m_j. C n is obtained from ^2 m -l by deleting the first 2 m -l-n information 
symbols from each codeword in C ?m _i [9-11]. Let A . and B . be the number 
of codewords of weight i in C n and its dual respectively. Let B be an element 
in the Galois field GF(2 m ). The trace of B> denoted Tr(B)» is defined as 
follows: 

m-1 

Tr(B) = I B , (8) 

j=0 


which is either 0 or 1. Let a be a root of p (X ) and let 

a.j = TrCa 1 ) 


m __ p n 

for 0_<i <2 m - 1 . Since p(a )=0 for 0£h<m, 
Tr(‘) that, for 0<i<2 m -l, 


it follows from the linearity of 


(9) 

trace 
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(10) 


m-1 


■ l Pi a 


i+m2 j=0 J i+j2 


where the suffixes are to be taken modulo 2 m -l. It is known [8] that for a 
nonnegative integer u less than 2 m , v = ( a u > a u +i» ■ • • ,a u +2 m -2^ a maximum " 
length-sequence in C^ m . Since the weight of v = ( a u > a u +i»* • * ,a u +2m_2^ 1S 
2 m "^, it is easy to see that 


2 m -l-n,i 


= B 


n ? m 'l i 
P -1 


( 11 ) 


For l<i<2 m , let denote the weight of ( a u » a u+ i» • • • » a u+ -j_i^ which is a 

prefix of v. Let N n =0. Then B . is equal to the number of occurrences of 

u n j i 

integer i in the sets 

{ N J+n - N.: 0 < j < 2 m -l-n} , (12) 

and 

{2 m_1 -N.+N m : 2 m -l-n < j < 2 m -l} . (13) 

J j-2 m +l+n 

For instance, we chan choose u such that a u+1 - = 0 for 0<i<m-l and a u+m _j=l. 

Now we estimate the order of computation time for finding N ^ . Let p be 
the number of nonzero coefficients of the generator polynomial. We consider 
the following two methods. 


Method- I 

For small p, we. can generate a u+i with 0<i<2 m by using recurrence formula, 

m-1 

a u + i Pj a u+i+j-m 5 

and obtain N. from N.^ by increasing N. by one only if a u+i is one. Then the 

computing time is upper bounded by c 0 p2 m , where c Q is a constant. Hereafter, 
c j denotes a constant. 
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Method-II 

If mandp are large, then we can use the following procedure for computing 

h 

N . . We assume that (i) the word length of computer is 2 or greater where 

J 

0<h<m<2 rn ~^ 1 , and (ii) word operations, "bit-wise Logical -AND" and "bit-wise 
Exclusive-OR" are available. For 0<i , let 


L = (a . a h 
1 i2 n +u’ i2 n +u+l 


, . • • , a 


i2 r +u+2 -1 


) . 


(14) 


Then it follows from (10) that 


m-l 

a i+m ' p j a i+j ’ 


(15) 


We first generate (0 ,0 , . . . ,0, 1 ,a u+m ,a u+m+1 ,a u+(m _ 1)2h _ 1 ). I.e., 1,,,^, 


.a 


m-l 


by 


using 


m-l 

a u+i+m ” .j-Q p j a u+i+j 


The computing time is upper bounded by c^pm2. Next we compute 3 m ,a m+1 
a 


,m-h 


m _ h by using (15), the computing time of which is upper bounded by c 2 p( 2 " -m) 


‘-1 


,m-h 


From a. with 0 < i <2 , N, ,N 9 , . . . ,N . , . . . ,N can be found sequentially as follows 

1 1 u J rtlTi i 

Let j be i2 n +r, where 0<r<2 . Then N^ + ^ can be obtained from Nj by extracting 
the (r+l)-th bit of a^ . Nj is increased by one if and only if the result of 
^the extraction is nonzero. The computing time is c 2 2 m . Thus the total computing 
time is at most Cjpm2 n +c 2 p(2 m -rrO+CgB. For most cases, the first term is 
much smaller than the other terms. 

If c 2 p2~^+c 2 <CqP, then the second method is more effective than the first 
one. For both methods, {B 0_< i <2 m } can be found from (11) to (13) by c,2 ni 
computing time. Hence, the total computing time for finding {B 


0< i <2 m } for 


n,i 


q different code-lengths is upper bounded as follows: 
(1) For the first method, 

(CqP + c 4 q)2 m . 


(16) 
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(17) 


(2) For the second method, 

c 1 pm2 h + (c 2 p2" h +c 3 +c 4 q)2 m . 

Now we compare the above methods with a "direct method" for computing 

IB •: 0 < i < 2 m ) which generates all linear combinations of the rows of a parity 

n,i - 

check matrix of C n - The computing time for generating a par.ity check matrix 
is upper bounded by c^pnin. To generate all linear combinations of the rows 
efficiently, we can use the Gray code in such a way that a new combination is 
obtained from preceding one by adding a row to it [12,13]. If we use word opera- 
tions, bit-wise logical-AND and bit-wise Exclusive-OR, then the computing time 
is proportional to n2 m /£., where l is the word length of computer. We assume 

that the set of code lengths n. with l£j<q for which (B CKi<2 m } is to be 

j n j , i 

found is given beforehand. Note that we don't need this assumption for the 
methods described above. If we use word operation "find the weight of a word", 
then the'order of the total computing time for finding the distributions 


{B n i : 
nj,i 

0_<i<2 m > for l_<jj<q can be estimated as 


where 

c 5 pmn max + (c 6 n m ax /K + c 7 q)2m 

(18) 


"max = ™x{n J .,2 , "-l-n j: llfcq) • 

(19) 


Since c 2 =Cg, c^Cy and n max /£ is much greater than p, for most cases the first 

or second method is more efficient than the direct method, at least if C i <n niax / A - 

Let C denote the even weight subcode of C . In fact, C „ is a shortened 
n,e 3 n n,e 

code of the distance-4 Hamming code C generated by g(X) = (l+X)p(X). The 

2 m -l,e 

number of codewords of weight i in the dual code C „ of C . denoted B„ ■ . is 

3 n,e n,e n,i,e 


B . = B . + B 

n,i,e n,i n,n-i 


(20) 


For 15<n<2^, let and be the even weight shortened codes of 

n,e n,e 3 

length n generated by g^X) of (1) and g 2 (X) of (2) respectively. For 
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Pj(X) = X 15 +X 14 +X 13 +X 12 +X 4 +X 3 +X 2 +X+l and p 2 (X) = X 15 +X 14 +l, N.'s with l<i<2 15 

are computed by the first method. From these N^'s, the weight distributions 

{B . : 0<i<n} of the dual codes of and C^ 2 | for 16<n<2* 3 are obtained. 

From these weight distributions and (6), the error probabilities, p ( C n^e> E ) anc * 

P(CpLe), are computed and plotted in Figures 1 and 2 respectively as functions 
n , e 

o 15 

of channel bit-error-rate e for code length n=2 with 5<££l4 and n=2 -1. 

From Figures 1 and 2, we see that if the two distance-4 Hamming codes 
recommended by CCITT X.25 are shortened too much, the shortened codes do" not 
obey the bound 2~^ n "^ given by (4), i.e., their error-detection performance 
becomes poor as e becomes large. Therefore, in order to maintain the data 

reliability, the length of a data packet should not be too short. In Tables 1 

and 2, we tabulate some code lengths for which the error probabilities, 
P(Cp|, ) and Pici 1 !, ), do not obey the bound 2’^ n ~*^ given by (4). We also 
tabulate the peak values of error probabilities and the values of channel bit- 
error-rate e where the peak values occur. Note that in most cases, peak values 

? ( 1 ) 

occur for 4/n<e<5/n. For the longer values n=2 with 8 <£<14 for ' and with 

10_<£<14 for no peak is detected within accuracy in computation. The peak 

values of the probability of undetected error for and are plotted in 

n jG n jG 

Figure 3 as functions of code length n. From Tables 1 and 2 and Figures 1-3, 
we see that the codes generated by the polynomial g^X) of (1) give better 
performance than the corresponding codes generated by the polynomial ^(X) 
of (2). 

3. Conclusion 

In this paper, we have investigated the error-detection performance of 
shortened Hamming codes, particularly the shortened codes obtained from the 
two distance-4 Hamming codes adopted by CCITT recommendation X.25. First two 
methods for computing the weight distributions of the dual codes of shortened 
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Hamming codes have been presented. We have shown that these methods are in 
general more effective than the direct method. Using the weight distributions 
of the dual codes, we have evaluated the probability of undetected error for 
the codes obtained from shortening the two X.25 distance-4 Hamming codes. We 
have shown that shortening does affect the error-detection performance of the 
two X.25 codes. If the codes are shortened too much, the shortened codes do 
not obey the bound 2 . We have also shown that the codes generated by 

<jj(X) = X 16 +X 12 +X 5 +l give better performance than the corresponding codes 
generated by g 2 (X) = X 16 +X 14 +X+l. 
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Table 1 

The maximum values of P(C nje ,e) for 0<ej<l/2 


P^n,e» e ) 


22 

1.85x10-1 

1.82x10-4 

24 

1.71x10-1 

1.69x10-4 

26 

1.59x10-1 

1.50x10-4 

28 

1.48x10-1 

1.31x10-4 

30 

1.39x10-1 

1.15x10-4 

32 

1.30x10-1 

1.00x10-4 

40 

1.05X10" 1 

7.83x10-5 

50 

8.55xl0- 2 

1 

5.12x10-5 

64 

7.03x10-2 

3.18x10-5 

128 

4.55x10-2 

1.70x10-5 






Table 2 

The maximum values of P(C n>e ,e) for 0<e<l/2 


P(Cn,e> £ ) 


22 

24 

26 

28 

30 

32 

40 

50 

64 

128 

256 


1.98x10-1 

1.84x10-1 

1.70x10-1 

1.58x10-1 

1.46x10-1 

1.36x10-1 

1.08x10"! 

8.63xl0- 2 

6.73x10-2 

3.47x10-2 

1.93x10-2 

1.19x10-2 


2.10x10-4 

1.94xl0- 4 

1.72x10-4 

1.50x10-4 

1.38x10-3 

1.64x10-4 

1.89x10-4 

1.67x10-4 

1.32x10-4 

5.12x10“^ 

2.28xl0- 5 

1.67x10-5 


512 





PROBABILITY OF UNDETECTED ERROR P(C^ 



BIT-ERROR-RATE e 

Figure 1 Actual values of probability of undetected error for 

the_shortened cyclic Hamming code of length n generated 
by g^(X) = 1 +x5+x12+x 16 as a function of channel bit 
error rate e. 


Figure 2 Actual values of probability of undetected error for 

the shortened cyclic Hanming code of length n generated 
by g£ ( X ) = 1+X+X 14 +X 16 as a function of channel bit 
error rate e. 
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THE PEAK VALUES OF THE PROBABILITY OF UNDETECTED ERROR 



10 100 1000 

CODE LENGTH n 


Figure 3 


The peak values of the probability of undetected error 


for C 


and C 


( 2 ) 

n,e ‘ 


